Fault Tolerance through Re-Execution in Multiscalar Architecture

نویسندگان

  • Faisal Rashid
  • Kewal K. Saluja
  • Parameswaran Ramanathan
چکیده

Multi-threadingandmultiscaling are two fundamental microarchitecture approaches that are expected to stay on the existing performance gain curve. Both of these approaches assume that integrated circuits with over billion transistors will become available in the near future. Such large integrated circuits imply reduced design tolerances and hence increased failure probability. Conventional hardware redundancy techniques for desired reliability in computation may severely limit the performance of such high performance processors. Hence we need to study novel methods to exploit the inherent redundancy of the microarchitectures, without unduly a ecting the performance, to provide correct program execution and/or detect failures (permanent or transient) that can occur in the hardware. This paper proposes a time redundancy technique suitable for multiscalar architectures. In the multiscalar architecture, there are usually several processing units to exploit the instruction level parallelism that exists in a given program. The technique in this paper uses a majority of the processing units for executing the program as in the traditional multiscalar paradigm while using the remainder of the processing units for re-executing the committed instructions. By comparing the results from the two program executions, errors caused by permanent or transient faults in the processing units can be detected. Simulation results presented in this paper demonstrate that this can be achieved with about 5-15% performance degra-

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimization of WS-BPEL Workflows through Business Process Re-Engineering Patterns

With the advent of XML-based SOA, WS-BPEL swiftly became a widely accepted standard for modeling business processes. Although SOA is said to embrace the principle of business agility, BPEL process definitions are still manually crafted into their final executable version. While SOA has proven to be a giant leap forward in building flexible IT systems, this static BPEL workflow model should be e...

متن کامل

Architectural Plan for Constructing Fault Tolerable Workflow Engines Based on Grid Service

In this paper the design and implementation of fault tolerable architecture for scientific workflow engines is presented. The engines are assumed to be implemented as composite web services. Current architectures for workflow engines do not make any considerations for substituting faulty web services with correct ones at run time. The difficulty is to rollback the execution state of the workflo...

متن کامل

Architectural Plan for Constructing Fault Tolerable Workflow Engines Based on Grid Service

In this paper the design and implementation of fault tolerable architecture for scientific workflow engines is presented. The engines are assumed to be implemented as composite web services. Current architectures for workflow engines do not make any considerations for substituting faulty web services with correct ones at run time. The difficulty is to rollback the execution state of the workflo...

متن کامل

Using speculative execution for fault tolerance in a real-time system

Achieving fault-tolerance using a primary-backup approach involves overhead of recovery such as activating the backup and propagating execution states, which may a ect the timeliness properties of real-time systems. We propose a semi-passive architecture for fault-tolerance and show that speculative execution can enhance overall performance and hence shorten the recovery time in the presence of...

متن کامل

Providing Adaptive Fault Tolerance through the Reconfigurable ARMOR Architecture of Chameleon

This paper presents the reconfigurable architecture of Chameleon, a software infrastructure that provides fault tolerance services across a network of unreliable nodes. Chameleon provides fault tolerance services through processes we call ARMORs (Adaptive Reconfigurable Mobile Objects of Reliability). ARMORs has a hierarchical structure that supports a well-defined approach for constructing ARM...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000